Customized Trajectory Optimization and Compliant Tracking Control for Passive Upper Limb Rehabilitation

Passive rehabilitation training in the early poststroke period can promote the reshaping of the nervous system. The trajectory should integrate the physicians’ experience and the patient’s characteristics. And the training should have high accuracy on the premise of safety. Therefore, trajectory customization, optimization, and tracking control algorithms are conducted based on a new upper limb rehabilitation robot. First, joint friction and initial load were identified and compensated. The admittance algorithm was used to realize the trajectory customization. Second, the improved butterfly optimization algorithm (BOA) was used to optimize the nonuniform rational B-spline fitting curve (NURBS). Then, a variable gain control strategy is designed, which enables the robot to track the trajectory well with small human–robot interaction (HRI) forces and to comply with a large HRI force to ensure safety. Regarding the return motion, an error subdivision method is designed to slow the return movement. The results showed that the customization force is less than 6 N. The trajectory tracking error is within 12 mm without a large HRI force. The control gain starts to decrease in 0.5 s periods while there is a large HRI force, thereby improving safety. With the decrease in HRI force, the real position can return to the desired trajectory slowly, which makes the patient feel comfortable.


Introduction
With global aging and the increasing number of patients with limb motor dysfunction caused by nerve injury, how to meet the urgent rehabilitation demand and improve the quality of life of patients is a key issue to be solved [1]. Compared to the traditional manual rehabilitation method by therapists, the robot-assisted rehabilitation method has the advantages of high repeatability, high precision, and accurate quantitative evaluation, which has become a research hotspot worldwide [2]. Clinical results have shown that passive rehabilitation training can promote neural remodeling in the initial post-stroke period [3], which helps the muscles of the affected limb regain the ability to contract spontaneously. Traditional rehabilitation training is effective due to the practical experience of the physiotherapist and the real-time interaction. At present, the training trajectory of most rehabilitation robots can only be of some regular curves represented by clear mathematical functions. On the other hand, the training strategy cannot deal with a large HRI force, which might exert damage to the patient. Therefore, it is needed to study the customization of training trajectory and compliant control of passive rehabilitation training.
Emken et al. [4] proposed a teaching control strategy for a lightweight two-degree-offreedom (DOF) lower limb rehabilitation robot. Feng et al. [5] fixed an accelerometer on the lower limbs to collect data during the training stage as expected input for rehabilitation training. Morita et al. [6] adopted an impedance control strategy to drag upper limb rehabilitation robots and used the least square method to fit the original noise data to approximate the intention of the rehabilitation physician. You et al. [7] designed a torque control algorithm based on the self-developed DC motor, where the weight of the robot arm and the joint friction torque of the robot are compensated to achieve easy dragging. Yang et al. [8] directly used the admittance algorithm in Cartesian space to calculate the inverse joint position solution to drag the lower limb rehabilitation robot, without optimizing and tracking the personalized trajectory. However, the doctor and the patient's hands are coupled to the end of the robot, inevitably resulting in an unsmooth trajectory.
As for trajectory optimization, it is generally based on polynomials, including timeoptimal, energy-optimal, and acceleration-optimal methods [9,10]. In this paper, the raw trajectory points are first compressed using the Douglas-Puke method [11] and then the NURBS curve [12] is used to interpolate between the compressed points. Dong improved the solution process of the B-spline for the joint trajectory fitting of the 6R robot [13]. Mei optimized the end trajectory of the 6-DOF high-speed parallel robot by combining the joint minimum acceleration and the B-spline curve. They focused on reducing the acceleration of the joint and the fitting trajectory has a large deformation relative to the original trajectory [14]. However, there is no smoothness optimization of the fitting trajectory.
As for the trajectory tracking control, Wu et al. designed a fuzzy sliding mode controller to achieve position tracking of the exoskeleton upper limb rehabilitation robot [15]. The physiotherapist can manually adjust the admittance parameters of the outer loop according to the patient's condition so that the HRI force is included in the position control. Mushage et al. [16] designed a fuzzy neural network and an error-adaptive nonlinear controller based on state observation to track the trajectory of the 5DOFs upper limb exoskeleton and simulate the performance of the controller. Li et al. [17] designed a robust anti-interference controller to improve the trajectory tracking accuracy of the robot. Jia et al. [18] combined RBF neural network and PID for trajectory tracking control of a lower extremity exoskeleton robot. It can be seen that most researchers aim to improve the tracking accuracy with the complex controller. Good trajectory tracking ability can ensure the training effect, but the larger rigidity may make the patient feel uncomfortable or even injured once there is a large HRI force. Therefore, in a normal situation, the upper limb rehabilitation robot should enable trajectory tracking and, in an emergency, it should ensure safety. Trigili, Emilio et al. [19] use series elastic joint elements to achieve compliance of the rehabilitation robot. Miao et al. [20] design a position controller for passive training with a bilateral end-effector upper limb rehabilitation robot and an adaptive variable parameter controller to achieve compliance. Guo et al. [21] use a reinforcement learning algorithm to design a variable admittance control algorithm to achieve rehabilitation training matching the stiffness characteristics of patients' lower limbs. Among them, the structure and modeling of special flexible components are complex and will have errors. Other compliance control algorithms can modify the parameters online but the algorithms are slightly complex for passive rehabilitation training. Furthermore, the return movement after compliance was not considered before. Therefore, based on the self-developed 3DOFs end-effector upper limb rehabilitation robot, the methods of trajectory demonstration and optimization are proposed here. A variable gain control strategy is designed, enabling the robot to track the trajectory well with small HRI forces and comply with large HRI forces to ensure safety. Moreover, when the HRI force is reduced with a large position error, the position can also return to the desired trajectory with subdivision error, which means that the return action is not too rushed. It will be comfortable for the patient to continue the unfinished passive training.

Upper Limb Rehabilitation Robot System
The upper limb rehabilitation robot used in this paper is a self-developed 3DOF end-effector upper limb rehabilitation robot, which is mainly used for the training of the shoulder and elbow joints. It includes two horizontal rotating joints and a vertical prismatic joint. Figure 1 is a brief diagram of its structure, where Z 1 , Z 2 , and Z 3 represent the axes of the three joints, and q 1 , q 2 , and q 3 are the position variables of the three joints. The first two rotating joints are driven by two AC servomotors equipped with absolute encoders, and the third joint is driven by a double-acting cylinder. A displacement sensor is installed at the end of the cylinder and a three-dimensional force sensor is installed under the cylinder. Figure 2 is the prototype of the rehabilitation robot system. The program is developed with MATLAB software on the host computer. The slave computer executes the compiled control algorithm. The cylinder output force is controlled by a proportional pressure valve. Because of the overall height of the robot, the total stroke of the cylinder is 150 mm, which limits the training range of the shoulder joint in the sagittal plane. Furthermore, to ensure safety, the two rotating joints are mechanically limited. The working range of the robot in the vertical direction is the stroke of the cylinder which is very clear. Therefore, only the working space in the horizontal plane is shown in Figure 3.

Trajectory Customization
While customizing, the patient and the therapist simultaneously exert an interactive force at the end; the resultant force is as follows: where F t represents the force exerted by the physiotherapist and F p represents the force of the patient. Therefore, part of F t should counteract F p so that the robot can move as the therapist wishes. The force F p is variable and difficult to predict. However, for patients who have almost completely lost the ability to move, their interaction force can be assumed to be 0. The robot will move to utilize a force-based admittance control algorithm.
To realize effortless teaching, the rotational friction F f must be compensated. The interaction force F int is then converted into torques τ h of joints according to Equation (2), where J is the Jacobin matrix of the robot. Then the admittance algorithm (3) is applied to generate the desired input of the joint position. Therefore, the robot can rotate when there are interaction forces.
where k is the admittance coefficient and ∆q is the increment of the joint angle.
Regarding the prismatic joint, the static friction force will be tested and compensated according to Equation (4). In addition, the vertical load at the end can be detected and compensated with the help of the force sensor. The force applied by the physiotherapist F PZ is detected by the force sensor at the end and then the amount of pressure change in the rodless chamber is calculated by Equation (5). Therefore, the cylinder with different loads can be easily moved.
where F PZ represents the vertical force exerted by the therapist.
where A 1 is the area of the cylinder's piston, ∆P indicates the amount of change in pressure.

Trajectory Interpolation
A kth-degree NURBS curve defined by n + 1 polygon control vertices can be represented as a segmented rational polynomial function. The point on the NURBS curve for a given parameter u is obtained as follows.
where ω i is the weight factor, which is related to the control points d i . The larger the value of the weight factor, the closer the curve is to the control vertex. The first and last weight factors ω 0 , ω n > 0 and the rest ω i ≥0, which prevent the denominator from being zero, retain the convex wrapping nature and do not degrade the curve to a point due to the weight factor. N i,k (u) is a kth-degree normal B-spline basis function defined by a non-periodic and nonuniform node vector U = [u 0 , u 1 , · · · , u n+k+1 ] deduced from the Cox-De Boor recursive formula expressed as follows.
To make a kth-degree NURBS curve pass through a given set of points P i (i = 0, 1, . . . , n), it is necessary to ensure that the first and last points of the curve coincide with the points P 0 and P n , while ensuring that the nodes u i+k (i = 0, 1, . . . , n) in the curve definition field correspond to P i one-to-one. A kth-degree NURBS curve with n segments will be defined by n + 3 control points D i (i = 0, 1, . . . , n + 2), the weight factors ω i , and the node vector U = [u 0 , u 1 , . . . , u n+k+3 ].
To parameterize compressed points P i , three parameterization methods, named uniform parameterization, cumulative chord length parameterization, and centripetal parameterization, can be used. The second method can accurately reflect the distribution of the points P i and the fitting accuracy is high [22]. Therefore, the cumulative chord length parameterization is used to parameterize the points P i . The method is represented below.
Variables that affect the fitting effect of the NURBS curve are control points, curve nodes, and weight factors [23]. In the experiment, to simplify the calculation, the weight factors are set to 1. The different compression thresholds lead to different initial via points as well as the shape and smoothness of the fitted NURBS curves. Therefore, an intelligent optimization algorithm, with curvature as the objective function and compression threshold as the variable for optimization, is proposed to produce a continuous smooth curve as the training trajectory. The curvature of a point on the NURBS curve is written as follows.
whereṖ(u) andP(u) are the first and second derivatives of the curve that can be calculated according to Leibniz's rule. Simplify Equation (6) as follows.
Therefore, the kth-order derivative is deduced as Finally, the objective function is written as where m represents the number of interpolation points on the curve.
In addition, the trajectory should be limited to the workspace of the robot. According to the inverse kinematics model of the robot, the position q of three joints can be obtained from the coordinate points in Cartesian space. The optimization constraint is expressed as: where q i andq i represent the minimum and maximum limits of the joint i.

Optimization Algorithm
The Butterfly Optimization Algorithm (BOA) is a new type of metaheuristic group intelligence optimization algorithm inspired by the foraging and courtship behavior of butterflies in nature based on sensed fragrance [24]. It includes global search and local search. Compared with some existing metaheuristic algorithms, the basic BOA operation is simple, with few adjusted parameters and good robustness, and it has achieved good results in the preliminary application of engineering practice [25].
The fragrance is formulated as a function of the physical intensity of the stimulus as follows: where f is the perceived magnitude of the fragrance, that is, how strong the fragrance is perceived by other butterflies, c is the sensory modality, I is the stimulus intensity, and a is the power exponent dependent on the modality, which accounts for the variable degree of absorption. Generally, c = 0.01 and a = 0.1. In the case of a maximization problem, the intensity can be proportional to the objective function. Before the BOA enters the local or global search, the algorithm randomly generates the locations of individuals and produces their respective scents accordingly. Each butterfly moves to the current global optimal position g * during the global search phase. The global searching rule is written as: where x t i is the position of the ith butterfly in the tth iteration. Here, g * represents the current best position. The fragrance of the ith butterfly is represented by f i and r 1 is a random number in [0, 1].
The local search phase can be represented as follows.
x t+1 where x t j and x t k are positions of the jth and kth butterflies, respectively. In the butterfly foraging, whether it is in the local search phase or the global search phase, it is determined by the switching probability P static = 0.8. Each iteration compares a random number r 2 ∈ [0, 1] with P static . The final position update formula of the butterfly algorithm is as follows.
To solve the problems of slow convergence speed, low convergence accuracy, and easily falling into local optima of standard BOA, many researchers have improved the algorithms [26][27][28]. The improvements deal with multidimensional optimization problems. In this study, the trajectory data compression algorithm and BOA are combined to reduce the optimization problem from three-dimensional to one-dimensional. Therefore, in the iterative operation process, dynamic switching probability and t-variation strategies are used to improve the convergence speed and accuracy of BOA.
The idea of dynamic switching probability can be expressed in the following expression.
where T max represents the maximum number of iterations andn represents the current number of iterations. In the iterative process, random numbers r 2 1 are replaced with a t-distribution function, preventing local optimization and improving the convergence speed.
The probability density function of the standard t-distribution is as follows.
where n is the freedom of the gamma function Γ.
With the number of iterations of BOA correlated, Equation (19) can be rewritten as wheret =n/T max . In the experiment, let n = 20, the improved BOA algorithm is written as follows.
x t+1 The flow chart of the optimization is shown in Figure 4. Chose four test functions to test the performance of the proposed algorithm. The functions are listed in Table 1. The optimization results are shown in Figure 5. The iterative optimization processes of the improved BOA and the classical BOA are shown in Figure 6. The minimum sum of curvature of the improved BOA is 31.1908 after 100 iterations and the corresponding optimal compression threshold is 24.9157. The results of the classical BOA are 31.3277 and 24.042. It can be seen that the improved algorithm has a better regression speed and accuracy. The blue solid line represents the proposed improved algorithm. The red dashed line is the classical algorithm. It can be seen that improved BOA is better than the classical BOA.

Function
Dimension Interval Minimum  The interpolation curve is shown in Figure 7, where the blue dotted line represents the raw trajectory and the red line represents the optimized interpolation curve.

RBF Net-Based Controller
To realize the training motion planned by the rehabilitation physician, passive rehabilitation training requires good trajectory tracking performance under the premise of ensuring safety. First, a sliding mode controller based on the RBF approximation is designed to ensure good tracking performance in the training process. The dynamic equation of the n-joint robot is as follows.
M(q)q + C(q,q)q + G(q) + F(q) + τ d = τ (22) where M(q) is the n × n positive-definite inertia matrix, C(q,q) is the n × n Coriolis matrix, G(q) is an n × 1 vector of gravity forces, F(q) is an n × 1 vector of friction forces, τ d is the unknown applied interference and satisfies τ d ≤ d, d is the upper bound of τ d , and τ is the control input. Define the tracking error as follows.
Then, by substituting (25) and its derivative into (22), we obtain the simplified expression where Q(x) = M(q)(q d + Λė) + C(q,q)(q d + Λe) + G(q) + F(q). This is the model uncertainty, which needs to be approximated. Since the RBF network has universal approximation characteristics, the RBF neural network is used to approximate the unknown nonlinear function Q(x) and the RBF network algorithm is defined as follows.
where x is the input of the network, j is the jth node of the implicit layer of the network, ϕ(x) is the output of the Gaussian function of the network, and W * is the ideal weight of the network. The approximation error of the network is ε and ε ≤ ε N . The input vector of the network is x = e TėT q T dq T dq T d . The robot control input is designed as below.
where tr W T Ξ −1W is the trace of the matrixW T Ξ −1W and Ξ is a positive diagonal matrix. The derivative of V is written aṡ Substitute (29) andṀ(q) − 2C(q,q) = 0 into (31) to obtain the following expression.
To make the system stable, design the adaptation law as˙W = −˙Ŵ = −Ξϕ(x)r T with Ξ > 0. Additionally, design a robust term v = −(ε N + d) sgn(r) so that we can obtaiṅ V = −r T K v r ≤ 0. According to the LaSalle theorem, the closed-loop system is asymptotic and stable.

Variable Gain Strategy
Because the third joint of the robot is perpendicular to the first two rotating joints, the motion is decoupled between them, which can be controlled separately. Considering the compressibility of the gas, the position control of the cylinder is controlled by the PID combined with a velocity feedforward controller. Because the third joint is driven by air pressure resulting in compliant properties, only the first two joints are designed with a variable gain strategy to cope with excessive HRI force and ensure the safety of the training. Now, define the driving torque of the servomotor as follows.
where τ h is calculated according to Equation (2). Based on the control strategy mentioned above, the control gain is K v of Equation (28). According to the HRI force F h , the variable gain strategy is designed as follows.
where γ is a 2 × 2 positive definite diagonal matrix. This determines the maximum of K v . F h is the resultant force of F X and F Y . This is a scaler. σ 1 is a scaler and determines the working range of F h . The smaller the value of σ 1 , the smaller F h resulting in a maximum of K v .
According to (32) and (33), the value of K v will be small when there is a large HRI force, meaning that τ h becomes the main driving torque and the robot will move away from the desired trajectory. Once the HRI force decreases, K v increases, returning the real position to the desired trajectory to continue training. But if the value of the control gain is large, the return motion will be very fast, causing an uncomfortable feeling. Therefore, design the position error subdivision strategy so that the trajectory tracking process is gradually completed through small errors.
Assume the expected position of the deviated joint is q d , the actual position is q r , and the position error is E. Set a constant χ to divide the error E and each small segment of error will form a transitional expected position q di , as shown in Figure 8. The formula is as follows.
where the ceil() function returns the smallest integer greater than or equal to the specified expression. This function makes the parameter χ vary with the error E. For instance, in the beginning, E is large, making χ large, which ensures the subdivided error is small and the motion is slow. If χ is a fixed constant, the subdivided error will be large in the beginning and very small in the end, which leads to fast motion in the beginning and not being able to return to the desired trajectory in the end. The absolute value of E is taken to prevent χ from being zero. As the error in the tracking process always exists, so the minimum of χ is 1. The coefficient λ amplifies |E| to a value that is greater than 1, thereby avoiding the situation where χ is always 1. The control algorithm proposed here is shown in Figure 9.

Trajectory Customization Experiment
The first two joints are driven by servomotors. Different input voltages lead to different output torque. Therefore, ramp signals are used to test the static friction of the rotation joints. As shown in Figure 10, the voltage that drives joint 1 to start rotating is 2.2v (point A) and that for joint 2 is 3.7v (point B). They are used as compensation for the static friction of the rotating joint. Figure 11 shows the customization force of the first two joints. The blue lines represent the force of the X and Y directions (denoted F X and F Y ), which are measured by the force sensor. The orange lines represent the position of X and Y. Some force is still needed to rotate the robot, as the static friction and inertia forces may vary with the robot configurations. However, the largest force is 5.021 N, which means that it is easy to rotate the first two joints. The number of pulses output by moters  Regarding the prismatic joint, the vertical load at the end can be detected and compensated with the help of the force sensor. The static friction force test is shown in Figure 12. The solid blue line is the force and the crest and trough corresponding to the moments when the cylinder starts to move down and up. So, the force can be regarded as the friction force. The static friction is compensated according to Equation (4). Figure 13

Trajectory Tracking Experiment
Let a = 15, b = 10, σ 1 = 500, so the upper bound of K v is 25. The two joints use the same control parameter. During the experiment, the subject does not exert an active force and the tracking result is shown in Figure 14. The blue dashed line is the desired trajectory and the orange solid line is the real trajectory. Figure 15 shows the tracking performance in the horizontal plane and Figure 16 displays the cylinder tracking result. It can be seen that the tracking error ( Figure 17) is mainly from the third joint. The maximum tracking error is shown in Table 2. From the experiment results, it can be concluded that, without a large HRI force, better trajectory tracking can be achieved, which can meet the needs of passive training.

Compliant Control Experiment
During the experiment, the subjects randomly applied active force and the tracking results are shown in Figure 18. It can be seen that the actual position (solid orange line) deviates from the expected trajectory (blue dashed line). That is, trajectory tracking can adapt to the interaction forces randomly applied by subjects during the training process, verifying the effectiveness of variable parameters. Figure 19 shows the changes in the control parameter K v caused by the interaction force. The black dotted line represents the change in control parameters, the blue solid line represents the force F X in the X direction, and the blue dotted line represents the force F Y in the Y direction. Obviously, as the interaction force increases, the control gain decreases from the maximum value of 25. It can be seen from the enlarged figure of first compliance that the response time of impedance parameters to the interaction force is within 0.5s. At this point, the main torque to drive the robot to move is τ h , so the tracking can be compliant with the interaction force. It also can be seen that K v changes almost simultaneously with the interaction force.  Taking the first large displacement deviation as an example: when t = 7.45 s, the value of K v begins to decrease. From the enlarged image indicated by the arrow, it can be seen that the actual position starts to deviate from the expected position at t = 7.6 s. It can be considered that the controller can comply with the interaction force in 0.1 s. Sample the first desired position at t = 7.88 s, which is the position that needs to be returned to after the first deviation. The actual position returned to −0.656 at t = 13.09 s, while the value of K v returned to 24.98 at t = 9.92 s. It can be concluded that it takes 3.17 (13.09-9.92) seconds to return to the expected position after the significant external force disappears. So, the strategy designed here can comply with large forces quickly and return slowly, ensuring both safety and comfort. Figure 21 shows the displacement of robot joint 2. The three deviation positions are 1.793, 1.471, and 1.401 rad. The same analysis process as for joint 1 can lead to the same conclusion.

Discussion
The customization trajectory combines the training experience of rehabilitation physicians with the different characteristics of patients. Therefore, the training trajectory is of physiological significance and more effective. Hou et al. designed a load-adaptive zero-force control algorithm based on joint torque sensors [29]. However, installing a sensor at every joint will make the structure complex. Additionally, the control algorithm is complicated. The robot system in this study is only equipped with a three-dimensional force sensor at the end to detect the HRI force. Therefore, the vertical load and the gravity of the third link can be detected and compensated directly. As for the first two links of the robot, there is no need to compensate for their gravity because they rotate in the horizontal plane. However, the static friction of the rotating joint is identified and compensated. Meanwhile, the interactive force is converted into the joint space by the Jacobi matrix and then the admittance algorithm is used to achieve easy dragging of the robot. The approach avoids the inverse kinematic model compared with the direct application of the admittance algorithm in Cartesian space. Most studies only filtered the original trajectory data to reduce jitters after obtaining the trajectory [30]. Although the high-frequency noise was removed, some extreme points corresponding to the range of joint motion of the upper limb may be deleted too. In this paper, the data compression algorithm can retain the outermost point of the original trajectory and maintain the topological shape of the trajectory, which will ensure the maximum motion position. Then the improved BOA algorithm is adopted to obtain an interpolation NURBS curve with the smallest sum of curvature. However, the velocity and acceleration planning of the interpolation curve are not carried out in this study and they are just generated by the derivative of the NURBS curve, which can be adjusted by changing the density of interpolating points and the time to complete the training trajectory. The maximum tracking error in trajectory tracking control is within 12 mm, which may be due to the RBF network not being able to accurately approximate the dynamic model of the robot. The compliant control strategy can realize the compliant tracking property and slow returning movement. Compared with the parameter adaptation control strategy [31], the adaptability here may be inferior but the design of the controller is simpler, which is convenient for deployment and application.

Conclusions
In this paper, we introduced the self-developed upper limb rehabilitation training robot system briefly. The robot is a 3 DOFs end-effector robot, with the third prismatic joint vertical to the first two rotating joints. So, the motion of the third joint is uncoupled from the first two joints, which simplified the motion analysis and control. The trajectory customization with human-robot coupling is completed by load and friction compensation and admittance control. The customization forces are within 6 N making it easy to customize a personal training trajectory. The NURBS curve is used to interpolate between the compressed points and the curvature of the trajectory is optimized. It smooths the training trajectory and retains the topological shape of the original trajectory. Finally, the variable control gain algorithm based on the RBF network is designed. The proposed method ensures the trajectory tracking error within 12 mm and compliance with the large HRI force in 0.5 seconds to ensure the safety of passive training. With the method of error subdivision, the return movement after compliance is slow, making the patient feel comfortable. In the future, more useful rehabilitation modes will be studied and designed, especially the assisted-as-needed active rehabilitation strategy that is suitable for the patient who has regained some muscle strength.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
All test data mentioned in this paper will be made available upon request to the corresponding author's email with appropriate justification.

Conflicts of Interest:
The authors declared no potential conflict of interest with respect to the research, authorship, and/or publication of this article.